Shortest Unique Substring Query Revisited

نویسندگان

  • Atalay Mert Ileri
  • M. Oguzhan Külekci
  • Bojian Xu
چکیده

We revisit the problem of finding shortest unique substring (SUS) proposed recently by [6]. We propose an optimal O(n) time and space algorithm that can find an SUS for every location of a string of size n. Our algorithm significantly improves the O(n) time complexity needed by [6]. We also support finding all the SUSes covering every location, whereas the solution in [6] can find only one SUS for every location. Further, our solution is simpler and easier to implement and can also be more space efficient in practice, since we only use the inverse suffix array and longest common prefix array of the string, while the algorithm in [6] uses the suffix tree of the string and other auxiliary data structures. Our theoretical results are validated by an empirical study that shows our algorithm is much faster and more space-saving than the one in [6].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shortest Unique Queries on Strings

Let D be a long input string of n characters (from an alphabet of size up to 2 , wherew is the number of bits in a machine word). Given a substring q of D, a shortest unique query returns a shortest unique substring of D that contains q. We present an optimal structure that consumes O(n) space, can be built in O(n) time, and answers a query in O(1) time. We also extend our techniques to solve s...

متن کامل

Shortest Unique Substring Queries on Run-Length Encoded Strings

We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i ≤ s ≤ t ≤ j) if for any i′ ≤ s ≤ t ≤ j′ with j − i > j′ − i′, S[i′..j′] occurs at least twice in S. Given a run-length encoding of size m of a string of len...

متن کامل

Shortest unique palindromic substring queries in optimal time

A palindrome is a string that reads the same forward and backward. A palindromic substring P of a string S is called a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if P occurs exactly once in S, this occurrence of P contains interval [s, t], and every palindromic substring of S which contains interval [s, t] and is shorter than P occurs at least twice in S. The SUPS...

متن کامل

Tight bound on the maximum number of shortest unique substrings

A substring Q of a string S is called a shortest unique substring (SUS) for position p in S, if Q occurs exactly once in S, this occurrence of Q contains position p, and every substring of S which contains position p and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query position p all the SUSs for position p can ...

متن کامل

Tight Bounds on the Maximum Number of Shortest Unique Substrings

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014